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Abstract: We develop and apply an approach for analyzing multi-curve 
data where each curve is driven by a latent state process. The state at any 
particular point determines a smooth function, forcing the individual curve 
to “switch” from one function to another. Thus each curve follows what 
we call a switching nonparametric regression model. We develop an EM 
algorithm to estimate the model parameters. We also obtain standard errors 
for the parameter estimates of the state process. We consider several types 
of state processes: independent and identically distributed, independent 
but depending on a covariate and Markov. Simulation studies show the 
frequentist properties of our estimates. We apply our methods to a data set 
of a building’s power usage. 
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1. Introduction 

We develop and apply a method to analyze multi-curve data where each curve 
follows a switching nonparametric regression model (De Souza and Heckman, 
2014): each curve, over its domain, switches among J unobserved states with 
each state determining a function. The main goal is to estimate the function 
corresponding to each state and the parameters of the latent process, along 
with some measure of accuracy. 

We are motivated by the problem of calculating a building’s “typical curve” 
of energy consumption, that is, its expected energy consumption as a function of 
time and other variables (e.g., weather conditions). Such knowledge allows build¬ 
ing managers to compare the building’s real-time performance to its “typical” 
performance which is useful, for instance, for assessing the impact of improve¬ 
ments on a building’s energy efficiency. The data set we analyze was supplied 
to us by PulseEnergy, now part of EnerNOC. 

To understand our methodological approach, compare the plots in Figures 
1 and 2. Figure 1 shows hourly power usage during the months of June and 
July 2009 in an office building. On some days (holidays and weekends) energy 
usage is very close to zero. We can observe that on some business days the 
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energy usage is very high, approximately twice as much as on the other days. 

This high power consumption occurs on warm days, when the cooling system 
(also called the chiller) of the building was probably on. Figure 2 presents the 
building daytime power usage from 9am to 4pm on 44 business days in June 
and July 2009. Several types of curves can be observed: one type corresponds 
to days when the cooling system was probably on and another type when the 
cooling system was off. We can also observe that on some days the chiller turned 
on in the middle of the day. On one day the chiller went on, off and then on 
again. 

Brown, Barrington-Leigh and Brown (2012) consider the data in Figure 1 
using a very computer intensive method. They find the “typical curve” by ap¬ 
plying a local constant kernel smoother over an extremely large number of data 
points, and thus, their contribution to the analysis is mainly one of computa¬ 
tional efficiency. They do not consider the special structure we see in Figure 2. 

One shortfall of their smoothing method is that they do not model the abrupt 
changes in level of energy consumption, and thus their approach may over¬ 
smooth these changes. Since these changes are real features in the data, they 
should be modelled explicitly to better understand power usage. Our method 
exploits the structure of Figure 2 and differs from the approach proposed by 
Brown, Barrington-Leigh and Brown in two important ways: one, by treating 
each business day as a replicate and the other, by modelling abrupt changes by 
modelling the building power usage as arising from two functions, one function 
giving power usage when the chiller is off, the other function giving power usage 
when the chiller is on. The condition “chiller on”/‘‘off’ at any particular time 
cannot be observed directly. It can only be inferred from the data, thus forming 
a latent process. 

De Souza and Heckman (2014) present the case where there is a single real¬ 
ization, a single curve switching among J functions. In that paper, we consider 
two models for the latent process: one where the states are independent and 
identically distributed, the other where the sequence of states forms a Markov 
chain. In addition to estimating all parameters and functions, we derive stan¬ 
dard errors for the parameters of the latent process. In the present paper we 
extend our 2014 approach, considering the case when there are N curves, called 
replicates, with each replicate switching among J functions. We also consider 
a third type of latent state process, where the state depends on a time-varying 
covariate. 

Several authors have considered the single realization case, but from a Bayesian 
perspective with the smooth functions modeled as realizations of Gaussian pro¬ 
cesses. See, for instance, Tresp (2001), Rasmussen and Ghahramani (2002) and 
Ou and Martin (2008). These papers are discussed in more detail in De Souza 
and Heckman (2014). The paper of Ou and Martin (2008) also contains a 
Bayesian analysis of the replicate case. These three papers contain method¬ 
ology that can, in principle, lead to estimation of all functions and the latent 
variable process parameters. However, unlike our work, these three papers focus 
on estimation of just one function - the mixture. 

In a more recent related work, Langrock et al. (2015) consider generalized 
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additive models with a time component, where the predictor is subject to regime 
changes controlled by an underlying Markov process. The parameter estimates 
are obtained by a numerical maximum penalized likelihood approach. These 
authors focus on a single realization case and do not consider the replicate case. 

2. Overview of the proposed methodology 

We consider a data set with N replicates where replicate k contains n obser¬ 
vations yik,...,Unk and evaluation points xi,... ,Xn, which for simplicity are 
the same across replicates. Observation yik depends on Xi according to a hidden 
(unobserved) state Zik with possible state values in {1,..., J}. If Zik = j the 
expected response of yik is fj{xi). In this work, we assume the replicates are 
all generated from just one set of functions /i,...,/j, a reasonable assump¬ 
tion for the power usage data presented in Figure 2 and described in Section 
1. We consider three types of hidden states, those that are independent and 
identically distributed, those that follow a Markov structure and those that are 
independent but with distribution depending on some covariate(s). 

Our notation is as follows (with replicate index being fc = 1,..., iV). 

• Observed data: x = (a;i,..., Xn)^, fixed across replicates; covariate vectors 
vifc,.. .,v„fc; responses = (yf,..., y^)^, where yk = {yik, ■ ■ ■ ,yukV ■ 

• Hidden states: = (zf’,..., z^)”^, where Zk = {zik, ■ ■ ■, ZnkV, and Zik 

takes a value in {!,..., J}. 

• /jW = {fj{xi),...Jj{xn)V for j = and /z^(x) = {f,^^{xi), 

■ ■ ■ Jz^dXn))^ ■ 

We assume that the replicates, zi,..., z^v, are independent. Given the hid¬ 
den states Zk, Yk = /z^ (x) -I- efc, where ei,..., ejv, are independent and e^, has 
a multivariate normal distribution with mean equal to the 0 -vector and covari¬ 
ance matrix V, possibly depending on Zk- We denote this €k ~ MVN{0,V). 
Therefore, yi,. •. ,y 7 v are independent and, given the hidden states Zk, Yk 
MVN{f 2 ^ (x), V). We let 7 be the set containing /i(x),..., /j(x) and the pa¬ 
rameters in V. We assume that the distribution of each z^ is governed by a 
parameter vector a. Section 2.1 presents our different choices of V and a. 

Our goal is to estimate 9 = { 0 , 7 }, along with standard errors or some mea¬ 
sure of accuracy for the parameters in a. Similar to De Souza and Heckman 
(2014) we obtain the parameter estimates by maximizing 


N 


l{9) = Y, logp(yfc|0) + p{fi, ..., /j, Al,..., Xj), 


( 1 ) 


where p{yk\9) is the likelihood function based on the observed data from the 
fcth replicate and the exact form of P{fi ,..., /j, Ai,..., Aj) is chosen by the 
user. For our work, we set 
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The Aj’s are the smoothing parameters. 

The form of logp(yfc|0) is very complicated, since it involves the distribution 
of the latent Zifc’s. Therefore, we apply an Expectation-Maximization (EM) al¬ 
gorithm (Dempster, Laird and Rubin, 1977) to maximize (1). We can show (see, 
for instance, Cappe, Moulines and Ryden, 2005 and McLachlan and Krishnan, 

2008) that our EM algorithm generates a sequence of estimates, c > 1, 
satisfying Z(0(^+^)) > 

As in De Souza and Heckman (2014) one could also take a Bayesian approach 
by maximizing (1) with P arising from placing a Gaussian process prior on the 
//s. 

2.1. Choices of V and a 

We consider five models for V, the covariance of the residual error: V unre¬ 
stricted, V diagonal with either V = cr^I or with iith entry depending on the 
latent state, and two models generated from a “random intercept” covariance 
structure: a homogeneous random intercept model and a non-homogeneous ran¬ 
dom intercept models the latter with variability of the intercept depending on 
the value of the latent state. We will usually write in models where the 
variability depends on the latent state. However, sometimes we will omit the 
subscript z when referring to a general V. The unknown parameters in V are 
clear for our first two models. For the third model, the parameters in are 
crf,...,cr“. 

To define V following a homogeneous random intercept model, let yip = 
fzik {xi) + Cife- Suppose that 

tiik — T 

- {(5i,..., S]sf} and {cik for i = 1,..., n, k = 1,..., N} are independent; 

- dk’s are independent and identically distributed (iid) 

- CikS are iid N{0,a‘^). 

Therefore, V depends on only two parameters and can be written as 

Y = a^{l + dll^), (3) 

where I is an n x n identity matrix, 1 is an n-vector of ones and d = . 

Our data analysis (Section 6) requires the more complex covariance structure 
of a non-homogenous random intercept model, where the variance of the random 
intercept depends on the state. We define this model for the simple case, where 
there are J = 2 states. We assume that pip = fzik{i>ii) + ^zik,ik, where 

if ^ik — 1 then — 5p -t- 

- if Zik = 2 then e 2 ,ik = 4 + + dik] 

- Sk, dk, and e^fc are independent for i = 1,... ,n and k = 1,... ,N; 

- 6k s are iid fV(0,rf); 

- 'dfc’s are iid fV(0,T|); 

- CikS are iid N{0,a‘^). 
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Therefore, the covariance matrix for the non-homogeneous random intercept 
model is given by 

V,,=a^l + dai^ + d2U,ll), ( 4 ) 

where dj = j and is an n-vector with *th entry l{zik = 2). 

In our model a is the vector containing the parameters governing the dis¬ 
tribution of the hidden states. If the z^fe’s are iid^ then a is of length J with 
jth component equal to p{zik = j) = Pj- If the Zifc’s follow a Markov structure, 
that is, if p(zife|z(i_i)fc, • ■ ■, zik, a) = p(zifc|z(i_i)fc, a), i = 2,... TV, then the pa¬ 
rameter vector a consists of the initial probabilities, tt^ = p{zik = llo), and the 
transition probabilities, aij = p{zik = j|z(i_i)fc = I, a), j, T = 1,..., J. Note that 
the transition probabilities do not depend on i or k. 

In the case where the Zip’s are independent, with Zip’s distribution depend¬ 
ing on a vector of covariates = {l,vi^ik,V 2 ,ik, ■ ■ ■ ,yM,ik)'^j assume that 
p{zik = j|vife) = Pj{vik) follows a multinomial logistic regression model with 


P 7 (i /c) T' 

Jr-. ^ + PjlVi^ik H-h PjMVMdk = ^ik for j = 2, . . . , J 


’ Pl(Vjfe) 


so that 


and 


Pi(v*fc) = 


l + & 2 e' 


J /3^ 






J p/3,- v,fc 

J =2 


1 + E,='> 


for j = 2,..., J. 


In this case a contains all the regression coefficient vectors / 32 ,..., 13j. 


3. Parameter estimation 

We present the EM algorithm that we use to obtain the estimates of the param¬ 
eters in 9. In the M-step, we take the same approach as De Souza and Heckman 
(2014) and model each fj as a linear combination of K known cubic B-spline 
basis functions, so that /j(x) = , where (pj is the iV-vector of coefficients 

corresponding to fj and B is the n x K matrix with entries Bi^, = bi,{xi). 

The smoothing parameters, Ai,..., Aj, can be chosen by a data driven method 
or by eye. In Section 3.4, we propose and justify a leave-one-curve out cross- 
validation (CV) criterion to find the optimal Aj’s for the case when V is diagonal 
and use this method in our application. However, in our application when we 
assume a non-diagonal V and in our simulations, we choose the A^’s by eye. 

Let p(y^,z^|0) be the joint distribution of the observed and latent data 
given 9, also called the complete data distribution. The application of the EM 
algorithm to the replicate case is similar to that in the one realization case in 
De Souza and Heckman (2014). It is based on writing 

logp(y^,z^| 6 ») = logp(y^|z^, 6 ») -blogp(z^| 6 ») = £1(7) -I-£2(0)- 
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In the E-step we calculate 

= Ee(c)(logp(y^,z^|6»)|y^) = E^m (/:i(7)|y^) -t EgM (/:2(a)|y^)- 

( 5 ) 

In the M-step, we maximize S{9,= Q{9, 0*-°^ +-P(/i,..., /j, Ai,..., Aj) as 
a function of 9, to find our updated 

3.1. E-step 

Let s be an n-vector of possible hidden states, i.e., each entry of s is in {1,2,, J}, 
and let 

Pk{s)^‘'^ = p{'^k = s|y^, = p{zk = s|yfe, 

Then, since 

N 

= X! X! = s)logp(yfc|zfc = s,/s(x), Vs), 

k—1 all s’s 
N 

Ee(c)(/:i( 7 )|y^)) = X! X! logp(yfc|zfc = s,/s(x), V^) 

k—1 all s’s 


= -^log27r- [{yk - /s(x))^V^ ^(yfe - /^(x)) -t log |Vs|] . 

k—1 all s’s 

( 6 ) 


Similarly, we can write 

N 

Egio) (/: 2 (a)|y) = logp(zfe = s|a). (7) 

k—1 all s’s 

It is important to note that, in the building daytime power usage application, 
n (the length of s) is small so that, fortunately, in (6) and (7), the sum over all 
possible s’s is not too large. Note also that the calculation of Eg(c) (£i( 7 )|y'^) 
depends on the model for the Zips only via pk{s)E\ while Eg(a){C 2 {a) also 
depends on logp(z/j = s|a). 

Calculation ofpfc(s)('’) is easy, by first writing 

, Uc) _ p(y/c|zfc = s, X p(zfc = s\9^‘^'>) 
p{yk\9(‘^^) 

^ p(yfc|zfc = s,/s(x)(^), X p(zfc = s|a(^)) 

y] ^(yfclzfc = u,/u(x)('’\v('’)) X p(zfe = u|a('’)) 
all u’s 

We calculate p(yfc|zfc = s, /s(x), using the normality assumption for the 

distribution of yp given the hidden states zp- Thus, we need only calculate 
p{zk = s|a) for each latent state model, which is straightforward. 
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3.2. M-step via an ECM algorithm 

From (5) and the calculations in Section 3.1, we see that we want to find 
that maximizes 

1 ^ 

k—1 all s’s 

+ P(/i,...,/j,Ai,...,Aj) (9) 

N 

+ Pk{s)^''^^ogp{zk = s\a). ( 10 ) 

k—1 all s’s 

with respect to 0 = {a, /i(x),..., /j(x) and the parameters in V}, or at least 
satishes S* 0^‘='>) > S*^\0^’^'>). Note that is fixed and thus so are the 

We also consider the smoothing parameters, Ai,..., Aj, to be fixed. 
We apply a natural extension of the EM approach, the Expectation-Conditional 
Maximization (ECM) algorithm (Meng and Rubin, 1993), to find the sequence 
of 6 l(^)’s. 

Because (10) does not depend on the /^ ’s or V, the /^’s and V that max¬ 
imize S* are the /j’s and V that maximize ( 8 ) -I- (9). Therefore, the form of 
the maximizing fj’s and V will only depend on the model for the z’s via the 
P/c(s)(‘=)’s. 

The steps of the ECM algorithm can be summarized as follows. 

1. Hold V and the parameters in a fixed and maximize ( 8 ) -I- (9) with respect 
to the /j(x)’s, obtaining /i(x)(°+i),..., /j(x)(°+^). 

2. Hold the /j(x)’s and the parameters in a fixed and maximize ( 8 ) with 
respect to the parameters in V, obtaining 

3. Now hold the /j(x)’s and V fixed and maximize (10) with respect to the 
parameters in a, obtaining 

We now present details of the calculations of steps 1 to 3. 

1. Updating the fj(/x.) ’s (any V ). 

We propose a method to update the /j(x)’s that is straightforward and 
yields an estimate of f = (/i(x)^,...,/j(x)^)^ in a closed form. The 
trick is to write /s(x) in terms of /i(x),..., /j(x). To do this, let be 
the n-vector with ith element equal to 1 if Si = j, 0 else. Let Ig be the 
n by nJ matrix, Xg = [ diag(li_s) | ''' | diag(lys)]. Then we easily see 
that /s(x) = Xgf. Recall that fj(x) = Let B* be the nJ x KJ block 
diagonal matrix with each block equal to B and let 4> be the JRT-vector 
(j) = ,. ■., ■ Therefore f = B*()). Let R be the K x K matrix with 

entries R^j// = / b”{x)b”,{x) dx. Combining these calculations we see that, 
to hnd the fj’s that maximize (8) -I- (9), we must maximize, as a function 
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of (j), 

1 ^ 

"2^ [(yfc-XsB*</.fV-i(yfe-Z,B»]-</>^diag(AiR,...,AjR)</>. 

k—1 all s’s 

This expression is quadratic in (j) and is easily maximized in closed form. 

Let be this maximizing </) when we set V = So we let = 

2. Updating V. 

For a model with Vg = V, with no dependence on the state vector s and no 
restrictions on the form of V, we show in Section 1 of the Supplementary 
Material that is 

^ Pfe(s)^^^(yfc - /s(x))(yfe - /s(x))'^. (11) 

fc=l all s’s 


with /s(x) = /s(x)(°+^). Note that if the values of were non-random 
and known, then is a delta function and so V is similar to the 

sample covariance matrix of the yUs. 

When Vg = V follows a homogeneous random intercept model, according 
to our calculations in Section 1 of the Supplementary Material, we update 
the parameter estimates of the restricted V in (3) as follows. Let cr^ ('=+!) 
be 


1 / ^ 

= ]yf(n-l) E E - /s(x))'^(yfc - /s(x)) 

^ ^ Vfc^lalls’s 

1 ^ \ 

“ ^ E E [{yk - /s(x))^i]^ j, (12) 

k—1 all s’s / 

and be 

^2^E E Pfc(s)(“H(yfe-/s(x))^l]'- ^ (13) 

k—1 all s’s 

with replaced by Therefore = d('=+i) x 

The maximization in step 2 when Vg follows the non-homogeneous random 
intercept model is given in Section 1 of the Supplementary Material, for 
the case that J = 2 states. The ECM algorithm for diagonal Vg is given 
in Section 3.3. 

3. Updating a (any V ). 

We must maximize (10) with respect to a, with the calculations depending 
on the proposed model for the hidden states. 
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For iid pj = p{zif^ = j\<^) (10) becomes 

N n J N J 

fc=l all s’s i—lj = l fc=l all s’s j — 1 

where risj = X]"=i “ il) i-®-) number of entries in s that are 

equal to j. We maximize this as a function of pi,... ,pj, using a Lagrange 
multiplier for the restriction that ^/iPj = 1, obtaining: 

k—1 all s’s 


For Markov ZikS, where the vector a is composed of transition probabili¬ 
ties aij and initial probabilities tt^, we rewrite (10) as 


N 

A;=l all s’s 


n 

logTTsi +^logas,_iSi 
2=2 


= EE 

k—1 all s’s 


J J J 

YHsi = j)\og7T, + YY^- log aij 
1=1 1=1 i=i 


where Us^ij is the number of transitions in s from state I to state j, that 
is, Us^ij = X)r =2 I{'Si-i = Ksi = j}- We first maximize this with respect to 
aij, holding fixed, and then maximize with respect to tt^, holding the 
ajj’s fixed. Using Lagrange multipliers for the constraints — 1 

and ^1 = 1; obtain 

N 

E E 

(c+l) fc=lalls’s 

“b “ Uv n 

E E E 

fc=lalls’s 2=2 


dc-Hl) 


1 ^ 

]vE E Pk{sY"h{si=j). 

k—1 all s’s 


For Zik’s independent with distribution depending on some covariate(s), a 
contains the regression coefficients from our logistic regression model for 
p{zik = j|vjfc,a) =pj{vik,a). In this case, (10) becomes 

N n J 

k—1 all s’s 2=1 j—1 

which must be maximized numerically, for instance, via a Newton-Raphson 
method. 
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3.3. ECM algorithm when V is diagonal 


Recall that we consider two cases of V diagonal, one with V = and one with 
V = V^k = diag((T 2 j^,..., o'z„fc)- We could use the notation and steps of Section 
3.2, modifying Step 2 for these V’s. However, it is much easier to re-derive all 
three steps using the independence of the components of to rewrite £ 1 ( 7 ), 
and thus S'( 6 », in a simpler form. We will see below that, instead of the 
Pk (s)W’s in ( 8 ) and ( 10 ), we require the simpler 

=p{z,k = 


The forms of Pik{j)^‘^'^ are given in the Appendix A 

Here, we carry out the calculations of the ECM algorithm for the case that 
= diag(cr^^^,..., then provide the results for the case that V = cr^I. 

For Vz,,, we can write 


N 


^ 1 ( 7 ) = log p{y^k\zik]fzAxi),cjl.J 


k—1i=l 

^ N n J 




k=i i=i j=i 


log(27r) +logCT- 


{Vik - 


Therefore, 


N n j 

= -2 log'll (14) 

k—1i—1 j—1 
N J 

“2 “ /i(x))^Wfej(yfe - /j(x)) (15) 

k=l 3 = 1 

+ P(/i,...,/j,Ai,...,Aj) (16) 

+ Ee,.,{C2{a)\y^), (17) 

where 

Wfej = cr)-^diag(pifc(j)l'=l,... ,Pnk{jf‘'^). (18) 

To maximize S*{9, as a function of 0 = {/j(x), cr|, j = 1,..., J, and a} 
we apply the ECM algorithm as follows. 

1. Updating the /j(x) ’s. Holding the (t|’s and the parameters in a fixed and 
maximizing (15) + (16) with respect to /j(x) we obtain 

N 

/,(x) = EHfc.(A,)yfc, (19) 
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where 


Hfc,(A) =B 


N 


B + 2AR 


B^Wfc,. 


( 20 ) 


We let be /j(x) with a] in Wfej replaced by cr/^ ■ 

2. Updating the crfs. Now holding the /j(x)’s and a fixed and maximizing 
(14) + (15) with respect to aj we get 


JV n 

[vik - 

k—li—1 

N n 

k—li—1 


Let be a'j with fj{xi) = 

3. Updating a. Now we hold the /j(x)’s and the cr|’s fixed and maximize (17) 
with respect to the parameters in a. Recall that £ 2 ( 0 ?) = log[p(z^|a)]. For 
iid ZikS we write 


E,(c)(/:2(a)|y'') 


N J 


Eg(c) EEE \ogPj 1{ 


.k—l z=l j — 1 
N J 


EEE^os^t p^k{j)^‘'^- 



Maximizing with respect to pj, with the constraint that ~ 

yields 


P 


(c+i) _ 


N n 




k—li—1 

Similiarly, for Markov Zik^s, we calculate 


N n 

EE^^^o-i)* = = j\yk,o^‘''*) 

(c+l) ^ k=l i=2 _ 

I3 N n 

E Ep(^b-i)fc = 

k—l1—2 


and 


dc+l) _ 


N 

jv£ 


Plk 




For Zifc’s independent with distribution depending on some covariates 
we need numerical optimization methods, such as Newton-Raphson, to 
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obtain the coefficient estimates from our logistic regression model for 
p{zik = j|vifc) = pj{vik)- So, for example, if there are J = 2 states and 
the covariate vector is we apply a numerical method to 

obtain /32o and /32i that maximize 

N n 

EgCc) (/l2(/320)/32l)|y'^) = EE {pik (2)E) (/320+/32ll^ifc)~log(l+e^^“^^^^''’'“ ) } . 

k—1i—1 

All of the calculations in this section are easily modified for the case that 
V = CT^I: simply replace the with . For instance, step 2 becomes 

N n J 

^2(^+1) = _. 

Nn 

3.4- Choice of the smoothing parameters 

In principal, we can always compute the smoothing parameters by “leave-one- 
replicate-out” cross-validation. However, for many models, this can be com¬ 
putationally intensive. Fortunately, in the models with V = cr^I or = 
diag((T^j^^,..., 0-2^^), we can shorten calculations by using Theorem 1 below. In 
this section, we describe our iterative cross-validation procedure, implemented 
in our data analysis in Section 6.1. We describe our iterative procedure for 
= diag(cr^^j^,..., 0-2^^)• The steps for V = cr2i are the same except with 
replacing the (t|’s . 

In our data analysis we set the initial values, the A^-^^’s, to a value that worked 
well when tested on the data set. We update the A^-’s as follows. 

1. At iteration i, with A^ = Aj*\ j = 1,..., J, use the ECM algorithm of 
Section 3.3 to find the Pik{j)'s, the ctJ’s and the /j’s. 

2. Discard the /j’s from Step 1. 

3. Let Wfcj be as defined in (18) but with the aj’s andPifc(j)’s replacing 

the a^’s and Pifc(j)’s. Treat the ctJ’s and the Pikijfs and thus the W^^’s 
as fixed. 

4. For j = 1,..., J, over a grid of possible A values, set A^^^ as the value 
of A that minimizes the following leave-one-replicate-out cross-validation 
criterion: 


N 

CV,{X) =Y.[yk- /jy)(x)]^Wfc, [y, - /jy^x)] 




( 21 ) 
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where f ^ is the function that maximizes 

J A 

1 ^ 

= E [yr-/,(xEw,,[y.-/,(x)]+P(/„A) (22) 

r=l:r^k 

where, for our penalized log-likelihood approach, P{fj,X) = — A /= 
—A(/)jR0j. 

5. Repeat steps 1-4 with Xj = aE^\ j = !> • ■ • > tiH convergence. 

We use the final values of the Xj ’s to obtain all of the parameter estimates from 
the ECM algorithm as in Section 3.3. 

Finding A that minimizes (21) is computationally intensive. Fortunately, we 
can prove the following theorem. 

Theorem 1. Let fi\, ...jfjx be the maximizers of (15) + (16), with 
replaced by Let Hj-j be as in (20), but with replaced by W^j. Suppose 

that I — H/jj is invertible and ^kj is positive definite, j = 1,J. Then 


N rp 

CVfiX) = E [(I-Hfc,(A))-i(/,;,(x)-yfc)] A 


W 


kj 


(I-H,,(A))-i(/,;,(x)-yfc) 


k=l 


The proof follows directly from Lemma 2 in the Appendix B, which holds in a 
slightly more general setting. 


4. Standard errors for the parameter estimators of the state process 


As in the single realization case presented by De Souza and Heckman (2014) we 
use the results of Louis (1982) to obtain standard errors for the estimates of the 
parameters of the state process. For iid ZikS we consider J > 2 possible state 
values. For Markov Zik^s we restrict the possible number of states to J = 2 for 
ease of explanation and computing. 

Louis (1982) derived a procedure to obtain the observed information ma¬ 
trix when the maximum likelihood estimates are obtained using the FM al¬ 
gorithm. The procedure requires the computation of the gradient and of the 
second derivative matrix of the log-likelihood based on the complete data and 
can be implemented easily within the FM steps. We use Louis’s technique with 
known 7 to derive the formula for the information matrix associated with a. We 
then plug our final estimates, a and 7 , into this formula, yielding our estimated 
information matrix: 


i = £;„.^(-/:"(a)|y«) - (a)/:' (a)^|y«) 


(23) 


Q:=a,7=7 

where £ 2 ( 0 ;) = logp(z'^|a), CJ 2 W) is the gradient vector of £2 and is the 

associated second derivative matrix. Then I~^ is the estimated covariance of d 
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and the square root of the jjth element of is the standard error of the jth 
component of a. 

In the next sections we show how to calculate / for iid and Markov Zifc’s. 
Using similar techniques we also obtained standard errors for the intercept and 
slope parameters for the case when J = 2 and the Zik’s are independent with 
distribution depending on only one covariate. 


4 . 1 . Standard errors: iid zn-’s 


Setting the parameters in a equal to pi, ■ ■ ■ ,pj-i and noting that pj = 1 — 
consider the (J—l)x(J—1) matrix Eq(—£ 2 («)|y^) in (23) evaluated 
at a = a and 7 = 7 . One can show its jlth entry, for j ^ I, is equal to Nn/pj, 
where pj = 1 - Its jjth entry is 


Nn X 



Recall that nsj = ^27=1 li®* = j}- ^ne can also show that the (J—1) x (J—1) 
matrix EQ,(£ 2 (a)U 2 (a)^|y^) in (23) evaluated at a = d and 7 = 7 has off 
diagonal elements jl equal to 


N 


k—1 all s’s 


L hr - ^ hr - ^ 


N 


-E 


.all s’s 


Pj 


E ~ ! \ l ns,i ns,j \ \ / "■s.; Us.j 

Pk{s) - -r X > Pfc s) ^7 - 

\ n • n T / \ Til n T 


PJ 


Pi 


Pj 


PJ 


PJ 


Us,/ ^S,J 


all s’s 


Pi 


PJ 


and jth diagonal element given by 

N 


E E 


s J 

Pj 


risj 

PJ 


k—1 all s’s 

where Pk{s) = p(z/, = 5 |yfc,d, 7 ). 


N 


-E 

k^l 


E pJ^') 


.all s’s 


n. 


SJ ^s,J 

Pj PJ 


4-2. standard errors: Markov Zik’s 


For Markov Zi^s we show how to obtain standard errors for the estimates of the 
parameters of the z process for J = 2 possible state values. We apply Louis’s 
method to obtain standard errors for tti, 012 and 021 simultaneously. 

Recall that = YJi= 2 1('Sj-i = Ksi = j)- The 3x3 matrix Eq(—£"( a)|y'^) 
in (23) evaluated at a = d and 7 = 7 is diagonal with entries 


N 

itl{l - TTl)’ 


1 


N 


012(1 — 012 ) 


E E Pfc(s)E^(®*-i = 


k—1 all s’s 


2 = 2 


and 
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N 


a 2 i(l - 021) ^ 


Pfc(s)^I(si_i = 2). 


alls’s i—2 


The calculation of the 3 x 3 symmetric matrix rja{C 2 {a)C' 2 {a)^\y^) evaluated 
at a = d and 7 = 7 is straightforward but long. The matrix entries are given in 
Section 2 of the Supplementary Material. 


5. Simulation studies 

We carry out three simulation studies and consider that the Zik ’s can take values 
1 or 2. For each simulation study 300 independent data sets are generated, 
each with N = 100 replicates. In study 1, Zik, ■ ■ ■, Znk are iid and V follows 
the homogeneous random intercept model as in (3). In study 2, zik, ■ ■ ■, Znk 
follow a Markov structure and V also follows the homogeneous random intercept 
model. In study 3, Zik, ■. ■, Znk are independent, but with the distribution of Zik 
depending on a univariate covariate, Vik- In this third study, we take V = cr^I. 
In all three studies we use the same vector of evaluation points x and the same 
true functions fi and / 2 . The vector x = (xi,..., consists of n = 10 equally 
spaced points, 1,12,23,..., 89,100. The true function /2 is the same as used by 
De Souza and Heckman (2014) in their simulation studies. The true function fi 
is simply /2 — 0.1. In the third study, in each simulated data set, we generate 
Vik,k = 1,... ,n,i = 1,... ,N. Figure 3 presents examples of a data set from 
each simulation study with 20 of their 100 replicates. 

For Simulations 1 and 2 we generate each simulated data set as follows. 

1. Generate the z^^’s according to the specified model - iid for Simulation 1, 
Markov for Simulation 2. For the iid model, we set pi = p{zik = 1) = 0.5. 
For Markov z^^’s, we set transition probabilities 012 = p{zi = 2|zi_i = 
1) = 0.3 and 021 = p{zi = l|zi_i = 2) = 0.4 and initial probabilities 
TTl = 7r2 = 0.5. 

2. Generate the i/i^’s according to the homogeneous random intercept model 
of Section 2.1 with = 10“^ and cr^ = 10“®. 

3. Repeat steps 1 and 2 N = 100 times obtaining a data set of 100 replicates. 

For Simulation 3 we generate each simulated data as follows. 

1. Generate Vik’s iid N{0, 1). 

2. Generate the Zi^s such that P{zik = l\vik} = 1/[1 +exp(/3o + fdiVtk)] and 
so log[p 2 {vtk)/pi{vik)] = /3o + PlV^k■ We set /3o = 2 and /3i = 5. 

3. Generate the yik’s as follows. If Zik = 1 then pik = /i(xi) + Cik- If Zik = 2 
then pik = f 2 ixi) + Cik- The Cifc’s are iid a^). We set cr^ = 10“®. 

4. Repeat steps 1, 2 and 3 N = 100 times obtaining a data set of 100 repli¬ 
cates. 

We analyze the data using the proposed EM algorithm. We set initial pa¬ 
rameter values to be the true parameter values to speed up computation. We 
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did try initial values that were different than the true parameter values and the 
EM algorithm also converged, but it took longer than when starting from the 
truth, as expected. 

The values of Ai and A 2 are fixed and equal to 10“^ in all simulation studies. 

The value 10“^ was chosen by eye when tested on few simulated data sets. 

5.1. Results 

The three plots in Figure 3 show the fitted values /i(x) and / 2 (x) (dashed 
curves) for a simulated data set from each of simulation studies 1, 2 and 3. 

We assess the quality of the estimated functions via the pointwise empirical 
mean squared error (EMSE) as shown in Figure 4. We observe that in all simu¬ 
lation studies /i(x) and / 2 (x) produce very small values of EMSE (< 2 x 10“®). 

However, Simulation 3’s EMSE values for /i(x) are larger than in Simulations 
1 and 2 . 

Table 1 presents the mean and standard deviation of all 300 estimates of 
cr^ and for Simulations 1 and 2. Table 2 presents the mean and standard 
deviation of all 300 estimates of for Simulation 3. We can observe that in 
all three simulation studies we are slightly underestimating the values of these 
variance parameters. This may be due to the challenges of correctly adjusting 
the degrees of freedom in the estimates, in order to account for the estimation 
of the fj’s. 

Table 3 contains the mean and the standard deviation of the estimates of 
the parameters of the latent process for each simulation study, along with the 
averages of our proposed standard errors. Note that the standard deviations 
of the estimates are close to the values of the means of the proposed standard 
errors (s.e.’s), as desired. Table 3 also shows the empirical coverage percentages 
of both a 90% and a 95% confidence interval. We consider confidence intervals of 
the form “mean of the parameter estimates zLza/ 2 X proposed s.e.”, where Zaj^ 
is the a/2 quantile of a standard normal distribution with a = 0.1 and 0.05. The 
empirical coverage percentages for all three simulation studies are very close to 
the true level of the corresponding confidence interval. 

6. Analysis of the power usage data 

The data shown in Figure 2 consist of daytime hourly power usage of a building 
from 9am to 4pm (n = 8 observations in a day) on TV = 44 business days in 
June and July 2009. For the same days and hours we also have available the 
temperature near the building. We apply our proposed methodology to these 
data treating each day as a replicate and modelling power usage as arising from 
J = 2 functions, one function giving power usage when the chiller is off (/ = 1), 
and the other function giving power usage when the chiller is on (j = 2). In 
Section 6.1 we present the results assuming the covariance matrix V is diagonal 
and in Section 6.2 we present the results when we assume V is generated by the 
non-homogeneous random intercept model as in (4). 
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6.1. Results: diagonal V 

In this section we consider two models for V: V = a^I and V = Vz^, = 
diag((jf^j^,... We use the ECM algorithm described in Section 3.3 to 

estimate the model parameters considering iid Zik’s, Markov Zik’s and Zip’s that 
are independent with distribution depending on temperature. The smoothing 
parameters, the A^’s, are chosen by cross-validation as described in detail in 
Section 3.4. 

Figures 5a and 5b present the fitted functions for iid hidden states Zips when 
we assume V = cr^I and Vz^, respectively. We can observe that the fitted curves 
are very similar in the two figures. The estimated curve giving power usage when 
the chiller is on, obtained assuming V = cr^I, is slightly smoother than the one 
obtained assuming Vz^,. Table 4 presents the parameter estimates and chosen 
Aj’s. We can see that the estimates of pj = p{zik = j) from the two models for 

V agree within the reported standard errors. We also observe in the lower half 
of the Table that the estimated variance when the chiller is on is much higher 
than when the chiller is off. 

Figures 6 a and 6 b present the fitted curves for Markov Zik’s when we assume 

V = cr^I and Vz^, respectively. As in the iid case, the htted curve giving power 
usage when the chiller is on obtained assuming V = cr^I is slightly smoother than 
the one obtained assuming Vz^,. Table 5 provides information on the estimated 
model parameters and the chosen smoothing parameters. As in the iid case, 
the estimated variance when the chiller is on is much higher than when the 
chiller is off. We observe that the estimates of 021 , the transition probability 
from “chiller on” to “chiller off’, are very small or equal to zero. Any estimate 
of 021 is expected to be small, as there is only one replicate in the data set 
where we see this transition. The estimate of zero is reasonable when we assume 
different variances; 021 is zero because the transition happens gradually, which 
our model does not allow, and the method incorrectly classihes all observations 
as coming from the condition “chiller on”, failing to detect the transition. This 
replicate is the green curve in Figure 6 b. 

Figure 7 presents the fitted curves when we assume the Zik’s are independent 
with distribution depending on temperature via the following logistic regression 
model: 

p(chiller on | temperature) 

log -- 7 -r = l3o+ Pi temperature. 

pfchiller off | temperature) 

Table 6 shows the corresponding estimated model parameters assuming Vz^, 
along with the chosen smoothing parameters, the Aj’s. We observe in Table 6 
that the standard error for is very small and by considering a confidence 
interval of the form jdi ± 1.96 x s.e.(/3i) we conclude that the coefficient /3i is 
statistically signihcant. 
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6.2. Results: correlated observations generated by the 
non-homogeneous random intercept model 

In the analyses of Section 6.1, we see that the variability in energy consumption 
when the chiller is on is higher than when the chiller is off. Thus, models such 
as V = cr^I or V following the homogeneous random intercept model may 
not be appropriate. Therefore, to model this heterogeneity in variance and the 
correlation between observations, we fit the proposed switching nonparametric 
regression model to the power usage data assuming the covariance matrix V is 
generated by the non-homogeneous random intercept model as in (4). We use 
the ECM algorithm described in Section 3 and in Section 1 of the Supplementary 
Material to obtain the parameter estimates. We conduct the analysis assuming 
the hidden states z^^’s are iid. The smoothing parameters, the A^ ’s, are fixed 
and chosen by eye. 

Table 7 presents the parameter estimates. We observe that the estimates of pi 
and p 2 in Table 7 agree within the reported standard errors with the estimates 
obtained in Table 4 where we assume the observations are uncorrelated. Figure 
8 shows the corresponding fitted curves. We can observe that the fitted function 
corresponding to the condition “chiller on” is lower than that in Figures 5 to 
7. The non-homogeneous random intercept model appears to “explain” days of 
high power usage by a larger variability of the “chiller on” random intercept. 
Thus the replicates with very high power usage have less of an impact on the 
final fitted “chiller on” curve. 

7. Discussion 

We have introduced a method for the analysis of data arising from random sam¬ 
ples of a process with a complex structure. The structure depends on a latent 
state process where each state corresponds to a true smooth regression function. 
The estimation techniques and standard error calculations were developed for 
several specific cases of state processes and error covariances. While the mod¬ 
els we considered may not capture all of the dependencies in a data set, our 
techniques and ideas should carry over to more complex latent state processes 
and richer time series modelling of the error process. Further useful extensions 
might incorporate a dependence among replicates; for instance, in studying en¬ 
ergy consumption of several buildings, one would want to incorporate a random 
“building” effect. 
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Appendix A: Forms of in the EM algorithm when V is 

diagonal 


When the life’s are independent, similarly to the one realization case of De Souza 
and Heckman (2014), we obtain 

Pikij)^"^ = p{ztk = j\y^k,0^'''>) 

_ piVik I Zik = j, fj {Xi )) X p/ 

Y.i=iP{yik\zik = X p/ 

with p{yik\zik = j, / easily calculated with our normality assump¬ 

tion. Here, p/ = p/ when the Zik’s are iid, and when Zik depends on the 

covariate v^fc, p/ = p{zik =lWik)- 
When the z^Ps are Markov, 


P^k{j)^‘'^ 




where = p{yik, ■ ■ ■ ,yik, z^k = and = piy(i+i)k, ■ ■ ■ ,ynk\zik = 

J, are obtained using, respectively, the forward and backward procedures 
proposed by Baum et al. (1970). 


Appendix B: Proof of Theorem 1 

Theorem 1 is based on the following lemmas, which frame the problem for fixed 
j and fixed A (so these are dropped in notation) and with general matrices Wr-, 
r = 1,..., A. Lemma 1 holds for general penalties, while Lemma 2 places further 
restrictions, restrictions that hold in our setting. Throughout, we assume that 
all maximizers exist. 

Let maximize 

5'(-fc)(j) = _i [yr-fiyi)/Wr[yr-f{x)]+P{f). 

r—l\r^k 

Lemma 1. Let /l*^) maximize 

= -i[/(-"nx)-/(x)]^Wfc[/(-'=)(x)-/(x)] 

H [Vr- f{^)f^r[yr- f{x)] + P{f). 

r=l,r^k 

Ify^k is positive definite then /*^“^^(x) = /(*^)(x). 
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Proof of Lemma 1. For simplicity let fc = 1. We want to show that /*■ = 

/(*!). We know /("i) maximizes ^\f) and, therefore, 

S'(-i)(/(-i))_5'(-i)(/(*i)) > 0. 

We also know that f^*^'> maximizes S^*^'>{f). Thus, S'(*i)(/(*i))_ 5 '(*i)(/(-i)) > 

0, that is, 

- i [/(-')(x) - />i)(x)]^Wi [/(-')(x) - 

-iffWr - f^*^H^)fWr[yr - f^*^H^)] + Pif^*^^) 

^ r ^2 
1 ^ 

+ 2 E [y- - - /'-'^x)] - P(/(-')) > 0, 

^ r ^2 

that is, 

-1 [/(-i)(x) - /(*i)(x)]^Wi [/(-^)(x) - /(*i)(x)] > > 0, 

which implies that 

[/(-i)(x) - />i)(x)]^Wi[/(-i)(x) - /(*')(x)] < 0, 
and, because Wi is positive definite, f^~^\x) = /(*^)(x). □ 

Lemma 2. Suppose that Wk is positive definite, k = 1,... ,N. Let f maximize 

S{f) =-^'/2[yk-f{x)/yVk[yk-fi^ic)] + P{f). 

^ k^l 

If there exist matrices Hk) A: = 1,..., N, not depending on the ^s, such that 
/(x) = Ef=i Ukyk, then 

(I-^fc) -yk] = f{x)-Yk. 


Proof of Lemma 2. Note that as defined in Lemma 1, is the maximizer of 
S with Yk replaced by By the assumption of the form of the maximizer 

of S, /(*^) (x) can be written as 


N 

/>'=)(x) = ^ HrYr+nkf^-'^H^) 

r=l:r/fe 

= f{x) -nkYk+'Hkf~''H^)- 
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From Lemma 1 we know *^(x) = Thus, 

/(-'=) (x) = /(x) - nkYk + nkf^~^\^)- 


Now subtracting yj, from both sides of this equation, we obtain 
(I - 'Hk)[f‘'~^\yi) - Yk] = /(x) - yfc. 


□ 
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Table 1 

Simulations 1 and 2: estimates of the covariance parameters cr'^ (true value = 1 x 



and (true value = 

1 X 10-^;. 

Simulation 

meanxlO® (SD* X 10®) 

meanxlO"^ (SD* X 10"^) 

1 

0.978 (0.046) 

0.977 (0.152) 

2 

0.978 (0.045) 

0.977 (0.152) 


*SD = standard deviation. 


Table 2 

Simulation 3: estimates of (true value = 5 x 10“^/ 
Simulation meanxlO^ (SD* x 10^) 

3 4.919 (0.238) 

*SD = standard deviation. 


Table 3 

Simulations 1, 2 and 3: estimates of the parameters of the z process. 


Sim’n 

t parameters 

mean (SD*) 

mean of s.e.’s 

empirical coverage 

90% 95% 

1 

pi = 0.5 

0.499 (0.016) 

0.016 

90.3% 

95.7% 

2 

TTl = 0.5 

0.502 (0.050) 

0.050 

89.7% 

95.7% 


0-12 = 0.3 

0.300 (0.021) 

0.020 

90.0% 

94.3% 


021 = 0.4 

0.401 (0.024) 

0.025 

89.7% 

95.3% 

3 

/3o = 2 

2.010 (0.173) 

0.176 

91.0% 

96.7% 


/5i = 5 

5.047 (0.357) 

0.364 

90.6% 

94.3% 


*SD = standard deviation. 


Table 4 

Data analysis results for iid Zi^ ’s for V = a^l and Vz^ = diag(<7^^^ , • • • > )? with 

corresponding fitted curves in Figures 5a and 5b, respectively. 



curve (chiller condition, j) 


Pj (s-e.) 


V = o-^I 

black (off, j = 1) 
red (on, j = 2) 

103.5 

0.665 (0.025) 
0.335 (0.025) 

0.020 

0.078 

V.. = 
diag(<^,.. 

black (off, j = 1) 

•. ) red (on, j = 2) 

12.7 

355.4 

0.658 (0.025) 
0.342 (0.025) 

0.073 

0.006 


Table 5 

Data analysis results for Markov z^^ ’s, for V = cj^I and Vz^ = diag{cr ‘^^^, • • •, )» 

corresponding fitted curves in Figures 6a and 6b. 



curve (chiller condition, j) 

^2 

TTj (s.e.) 

ai2 

(s.e.) 

“21 

(s.e.) 


V = aH 

black (off, j = 1) 
red (on, j = 2) 

103.1 

0.705 (0.069) 
0.295 (0.069) 

0.024 

(0.011) 

0.00991 

(0.00986) 

0.019 

0.084 

Vz, = 
diag(<T2^^,.. 

black (off, j = 1) 
red (on, i = 2) 

12.2 

400.2 

0.682 

0.318 

0.015 

< 10-1® 

0.050 

0.006 
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Table 6 

Data analysis results for Zi^ ’s with distribution depending on a covariate (temperature) and 
Vz^ = diag(cr‘^^^ »• • •, corresponding fitted curves in Figure 7. 


curve (chiller condition, j) 



$ (s.e.) 

^3 

black (off, j = 1) 

17.9 

^0 = - 

-13.015 (1.412) 

0.115 

red (on, j = 2) 

273.9 

$1 = 

0.607 (0.068) 

0.030 


Table 7 

Data analysis results for iid ’s and V depending on the hidden states generated by a 
random intercept model with Ai = A 2 = 0.1 and corresponding curves in Figure 8. 


curve (chiller condition, j) 

5-2 -2 p_ (g g ) 

black (off, j = 1) 
red (on, j = 2) 

12 4 9 3 472 0 0-672 (0.025) 
1Z.4 y.d 4/z.u g 228 (0.025) 


imsart-generic ver. 2013/03/06 file: 20161021_DeSouza_etal_Arxiv.tex date: October 25, 


2016 








24 




































De Souza et al./Switching nonparametric regression models for multi-curve data 25 


O 



Fig 2: Daytime power usage from 9am to 4pm on business days in June and July 
2009 in the same building as Figure 1. Each curve corresponds to a different 
day. 
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(a) Simulation 1: iid Zik’s (b) Simulation 2: Markov Zik’s 



(c) Simulation 3: covariate dependent Zik’s 

Fig 3: Example of simulated data along with /i(x) and / 2 (x) for each simulation 
study. The gray dashed curves correspond to 20 out of the 100 generated repli¬ 
cates. The black and red solid curves correspond to the true functions fi and / 2 , 
respectively, evaluated only at x. The black and red dashed curves correspond 
to /i(x) and / 2 (x), respectively. 
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(a) Simulation 1 


(b) Simulation 2 



(c) Simulation 3 

Fig 4: EMSE of fi{x) (black curves) and / 2 (x) (red curves) for all three simu¬ 
lation studies. 
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Hour of day Hour of day 


(a) (b) 

Fig 5: Building daytime power usage. Fitted function estimates (solid curves) 
assuming iid ZikS. In (a) we consider V = tr^I and in (b) = 

diag((T^^^,... The gray dashed curves correspond to the replicates. The 

red and black dashed curves are the initial function estimates. The colors red 
and black correspond to the condition chiller on and off, respectively. 



Fig 6: Building daytime power usage. Fitted function estimates (solid curves) 
assuming Markov ZikS- In (a) we consider V = a^I and in (b) = 

diag((T 2 ife) • ■ • Components of the plot are as in Figure 5a. The green 

curve in (b) corresponds to the replicate where there is a transition from chiller 
on to off. 
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Fig 7: Building daytime power usage. Fitted function estimates assuming the 
Zik’s are independent with distribution depending on temperature. = 

diag{al^^,... Components of the plot are as in Figure 5a. 
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Fig 8: Building daytime power usage. Fitted function estimates assuming iid 
Zik’s and V generated by a non-homogeneous random intercept model. Compo¬ 
nents of the plot are as in Figure 5a. 
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